35 research outputs found

    MATEX: A Distributed Framework for Transient Simulation of Power Distribution Networks

    Full text link
    We proposed MATEX, a distributed framework for transient simulation of power distribution networks (PDNs). MATEX utilizes matrix exponential kernel with Krylov subspace approximations to solve differential equations of linear circuit. First, the whole simulation task is divided into subtasks based on decompositions of current sources, in order to reduce the computational overheads. Then these subtasks are distributed to different computing nodes and processed in parallel. Within each node, after the matrix factorization at the beginning of simulation, the adaptive time stepping solver is performed without extra matrix re-factorizations. MATEX overcomes the stiff-ness hinder of previous matrix exponential-based circuit simulator by rational Krylov subspace method, which leads to larger step sizes with smaller dimensions of Krylov subspace bases and highly accelerates the whole computation. MATEX outperforms both traditional fixed and adaptive time stepping methods, e.g., achieving around 13X over the trapezoidal framework with fixed time step for the IBM power grid benchmarks.Comment: ACM/IEEE DAC 2014. arXiv admin note: substantial text overlap with arXiv:1505.0669

    Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration

    Full text link
    State-of-the-art convolutional neural networks are enormously costly in both compute and memory, demanding massively parallel GPUs for execution. Such networks strain the computational capabilities and energy available to embedded and mobile processing platforms, restricting their use in many important applications. In this paper, we push the boundaries of hardware-effective CNN design by proposing BCNN with Separable Filters (BCNNw/SF), which applies Singular Value Decomposition (SVD) on BCNN kernels to further reduce computational and storage complexity. To enable its implementation, we provide a closed form of the gradient over SVD to calculate the exact gradient with respect to every binarized weight in backward propagation. We verify BCNNw/SF on the MNIST, CIFAR-10, and SVHN datasets, and implement an accelerator for CIFAR-10 on FPGA hardware. Our BCNNw/SF accelerator realizes memory savings of 17% and execution time reduction of 31.3% compared to BCNN with only minor accuracy sacrifices.Comment: 9 pages, 6 figures, accepted for Embedded Vision Workshop (CVPRW

    Resource Efficient and Error Resilient Neural Networks

    Get PDF
    The entangled guardbands in terms of timing specification and energy budget ensure a system against faults, but the guardbands, meanwhile, impede the advance of a higher throughput and energy efficiency. To combat the over-designed guardbands in a system carrying out deep learning inference, we dive into the algorithmic demands and understand that the resource deficiency and hardware variation are the major reasons of the need of conservative guardbands. In modern convolutional neural networks (CNNs), the number of arithmetic operations for the inference could exceed tens of billions, which requires a sophisticated buffering mechanism to balance between resource utilization and throughput. In this case, the over-designed guardbands can seriously hinder system performance. On the other hand, timing errors can be incurred by the hardware variations including momentary voltage droops resulted from simultaneous switching noises, a gradually decreasing voltage level due to a limited battery, and the slow electron mobility incurred by the system power dissipation into heat. The timing errors propagating in a network can be a snowball in the beginning but ends up with a catastrophe in terms of a significant accuracy degradation.Knowing the need of guardbands originates from resource deficiency and timing errors, this dissertation focuses on cross-layer solutions to the problems of the high algorithmic demands incurred by deep learning methods and error vulnerability due to hardware variations. We begin with reviewing the methods and technologies proposed in the literature including weight encoding, filter decomposition, network pruning, efficient structure design, and precision quantizing. In the implementation of an FPGA accelerator for extreme-case quantization, binarized neural networks (BNN), we have realized more possible optimizations can be applied. Then, we extend BNN on the algorithmic layer with the binarized separable filters and proposed BCNNw/SF. Although the quantization and approximation benefit hardware efficiency to a certain extent, the optimal reduction or compression rate is still limited by the core of the conventional deep learning methods -- convolution. We, thus, introduce the local binary pattern (LBP) to deep learning because of LBP's low complexity yet high effectiveness. We name the new algorithm LBPNet, in which the feature maps are created with a similar fashion of the traditional LBP using comparisons. Our LBPNet can be trained with the forward-backward propagation algorithm to extract useful features for image classification. LBPNet accelerators have been implemented and optimized to verify their classification performance, processing throughput, and energy efficiency. We also demonstrate the error immunity of LBPNet to be the strongest compared with the subject MLP, CNN, and BCNN models since the classification accuracy of the LBPNet is decreased by only 10% and all the other models lose the classification ability when the timing error rate exceeds 0.01

    Algorithm for Determining Eye-Diagram Characteristics of Lossy Transmission Lines

    No full text
    本文主要是針對預測傳輸線接收端眼圖的方法作探討。影響單根傳輸線接收端眼圖的主要因素,是來自於傳輸線金屬導體有限的導電度與介質中位移電流與漏電流所造成的損耗。經由損耗物理機制的探討得到決定傳輸線脈衝響應的兩個參數後,採取時域的信號處理發展出快速預測眼圖的方式。 文中提出了當傳輸線兩端不匹配效應不顯著或傳輸線損耗明顯時,可產生接收端最差眼圖的輸入信號圖樣。此外,因頻域的傳輸線模型配合由損耗物理機制得到的模型元件值不能在時域上滿足因果律,一種利用K-K關係式解決該問題後得到的近似數學解亦在本文中被提出。利用符合因果律的完整脈衝響應中的兩參數來預測傳輸線接收端眼圖,本文發展出預測方式的流程。最後經由頻域與時域的量測,驗證該預測方法的準確性。The major theme of this thesis focuses on the methodology for the eye-diagram prediction of lossy transmission lines. The finite conductivity of the metal conductor and the loss due to displacement current and leakage current is the two primary factors to affect the eye diagram at the receiving end of transmission line. After obtaining the two important parameters that decide the impulse response by researching into the mechanism of loss of transmission line, this thesis applies time domain signal processing to develop the methodology to predict the eye diagram fast and accurately. The thesis proposes the input signal pattern that generates the worst eye diagram at the receiving end when the mismatch effects at both ends aren’t serious, or the loss of transmission line is obvious. Because the transmission line model incorporate with the values of elements derived by the mechanism of loss in frequency domain is non-causal in time domain, the thesis propose an approximate mathematic solution by applying the Kramers-Kronig relations on the problem of causality. By using the two important parameters in the causal impulse response, the thesis develop the methodology of prediction. Finally, the measurement in frequency domain with vector network analyzer and time domain with digital oscilloscope and pattern generator shows the accuracy of this methodology.目  錄 第一章 研究動機與簡介……………………………… 1 1-1 研究動機 1 1-2 文獻回顧 3 1-3 章節概要 5 1-4 貢獻 6 第二章 眼圖取得之原理與相關眼圖規範…………... 7 2-1 眼圖取得原理 7 2-2 問題描述(PRBS vs 1 bit) 10 2-3 眼圖規範 11 第三章 有損傳輸線轉換函數/脈衝響應之理論推導.. 17 3-1 有損傳輸線理論 17 3-2 轉換函數 20 3-3 完整脈衝響應 23 3-4 使用K-K 關係式解決不符合因果律的問題 25 3-5 近似步級響應 33 第四章 有損傳輸線接收端眼圖的模擬分析………... 39 4-1 利用正規化減少變數 39 4-2 取得參數A與c的方法 42 4-3 使用An與cn兩參數預測眼圖 49 4-4 不匹配情形 55 4-4 評估有損傳輸線之使用線長(A c) 58 第五章 實驗驗證……………………………………… 66 5-1 實驗設計與量測環境 66 5-2 頻域量測 68 5-3 時域量測 73 第六章 結論…………………………………………… 79 6-1 結論 79 6-2 未來工作 79 參考文獻 ………………………………………………… 8

    System Error Prediction for Business Support Systems in Telecommunications Networks

    No full text
    corecore